Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Clustering ensemble algorithms based on improved genetic algorithm in cloud computing

XU Zhanyang, ZHENG Kezhang

Journal of Computer Applications 2018, 38 (2): 458-463. DOI: 10.11772/j.issn.1001-9081.2017071749

Abstract （429）

PDF （1036KB）（398）

Save

Considering the problem that unsupervised clustering lacks priori information about data classification, the accuracy of base clustering is affected by clustering algorithm and general clustering ensemble algorithm has high space complexity, a Clustering Ensemble algorithm based on Improved Genetic Algorithm (CEIGA) was proposed. Focusing on the issue that traditional clustering ensemble algorithms can not meet the time requirement of large scale data processing, a Parallel Clustering Ensemble algorithm based on Improved Genetic Algorithm (PCEIGA) using Hadoop for cloud computing was also proposed. Firstly, the base clustering partitions produced by base clustering generation mechanism were encoded as the initial population of the improved Genetic Algorithm (GA) after changing cluster labels. Secondly, the diversity of base clustering was ensured by improving the selection operator of GA. According to the improved selection operator, crossover operation and mutation operation were adopted on chromosomes and the next generation population was gotten by elitist strategy to ensure the accuracy of base clustering. By this way, the final results of clustering ensemble reached global optimum and the accuracy of the algorithm was improved. To improve the efficiency of the proposed algorithms, two MapReduce processes were designed and one Combine process was added to reduce the communication among nodes. Finally, CEIGA, PCEIGA and four advanced clustering ensemble algorithms were compared on UCI data sets. The experimental results show that CEIGA performs better than other advanced clustering ensemble algorithms, and PCEIGA can significantly reduce running time and improve algorithm efficiency without decreasing the accuracy of clustering results.

Reference | Related Articles | Metrics